Tile Reduction: The First Step towards Tile Aware Parallelization in OpenMP
نویسندگان
چکیده
Tiling is widely used by compilers and programmer to optimize scientific and engineering code for better performance. Many parallel programming languages support tile/tiling directly through first-class language constructs or library routines. However, the current OpenMP programming language is tile oblivious, although it is the de facto standard for writing parallel programs on shared memory systems. In this paper, we introduce tile aware parallelization into OpenMP. We propose tile reduction, an OpenMP tile aware parallelization technique that allows reduction to be performed on multi-dimensional arrays. The paper has three contributions: (a) it is the first paper that proposes and discusses tile aware parallelization in OpenMP. We argue that, it is not only necessary but also possible to have tile aware parallelization in OpenMP; (b) the paper introduces the methods used to implement tile reduction, including the required OpenMP API extension and the associated code generation techniques; (c) we have applied tile reduction on a set of benchmarks. The experimental results show that tile reduction can make parallelization more natural and flexible. It not only can expose more parallelism in a program, but also can improve its data locality.
منابع مشابه
University of Delaware Department of Electrical and Computer Engineering Computer Architecture and Parallel Systems Laboratory Tile Reduction: an OpenMP Extension for Tile Aware Parallelization
Tiling is widely used by compilers and programmer to optimize scientific and engineering code for better performance. Many parallel programming languages support tile/tiling directly through first-class language constructs or library routines. However, the current OpenMP programming language is tile oblivious, although it is the de facto standard for writing parallel programs on shared memory s...
متن کاملTile Percolation: An OpenMP Tile Aware Parallelization Technique for the Cyclops-64 Multicore Processor
Programming a multicore processor is difficult. It is even more difficult if the processor has software-managed memory hierarchy, e.g. the IBM Cyclops-64 (C64). A widely accepted parallel programming solution for multicore processor is OpenMP. Currently, all OpenMP directives are only used to decompose computation code (such as loop iterations, tasks, code sections, etc.). None of them can be u...
متن کاملAutomatic multilevel parallelization using OpenMP
In this paper we describe tl, e eaietlsion of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by tile NanosCompiler t,, allow for directive nesting and definition of thread g r,,ups. We report first results for several benchmark _odes and one full application that have been par, d/eliz...
متن کاملOpenMP for next generation heterogeneous clusters
The last years have seen great diversity in new hardware with e. g. GPUs providing multiple times the processing power of CPUs. Programming GPUs or clusters of GPUs however is still complicated and time consuming. In this paper we present extensions to OpenMP that allow one program to scale from a single multi-core CPU to a many-core cluster (e. g. a GPU cluster). We extend OpenMP with a new sc...
متن کاملEstimation of the aerobic capacity by step test in the workers of a tile factory in Yazd in 2017
Object: Estimation of the Maximum Aerobic Capacity to Make a Physiological Proportion between Worker and the work is of calculation of the Maximum Aerobic Capacity is important to Make a Physiological Proportion between Worker and his work The purpose of this study is to estimate the highest aerobic capacity and Physical work capacityof tile and ceramic workers . Analysis method :In this cros...
متن کامل